vignettes/tutorial_associationsubgraph.Rmd
tutorial_associationsubgraph.Rmdassociationsubgraphs package, and to conducting the complete analysis including all the steps from the structure of the input data to the final visualization using an example data set.
devtools::install_github("tbilab/associationsubgraphs")
library(tidyverse)
library(associationsubgraphs)We’ll use Phecode pairs data available in the associationsubgraphs package as an example. The format of the input data set is similar to this data set, which is a dataframe including columns a and b representing the variables (nodes), and column strength that is a numeric indicator of strength of association (higher = stronger).
Strength represents how strongly two variables are associated with each other. For instance, in this example, node pairs refer to Phecode Pairs where strength of the association can be measured by the odds ratio from a 2 by 2 contingency table. And please remove node pairs with NA missing values of strength in the data set.
associationsubgraphs could handle large-scale input data with dimension such as the example Phecode pairs data.
#load example data set
data("phecode_pairs")
phecode_pairs = phecode_pairs %>%
arrange(desc(strength)) %>% #sort the strength in descending order
filter(!is.na(strength)) #filter out node pairs with missing values of strength
#dimension of the input data
dim(phecode_pairs) ## [1] 1462623 3
| a | b | strength |
|---|---|---|
| 296.00 | 300.00 | 131.0360 |
| 381.20 | 389.00 | 129.1788 |
| 173.00 | 216.00 | 127.6532 |
| 173.00 | 702.00 | 125.0173 |
| 636.00 | 655.00 | 122.8753 |
| 381.00 | 389.00 | 122.6985 |
id that corresponds to the variables coded in a and b of Phecode pairs data that contains additional info of the Phecodes (nodes). For example, color and Phecode category were added to each Phecode. And the added information will be shown in the description table after clicking a subgraph to see details.
#prepare the annotation data
annotate_node <- c(phecode_pairs$a,phecode_pairs$b) %>%
unique() %>%
as_tibble() %>%
rename(id = value) %>% #rename the column corresponds to the variables to "id"
left_join(.,phecode_def %>% dplyr::select(phecode,description,group,color) %>% dplyr::rename(id=phecode),by="id") %>% # add additional info
arrange(group)
#overview of the annotation data
head(annotate_node) %>%
knitr::kable()| id | description | group | color |
|---|---|---|---|
| 401.22 | Hypertensive chronic kidney disease | circulatory system | #D14285 |
| 394.00 | Rheumatic disease of the heart valves | circulatory system | #D14285 |
| 411.40 | Coronary atherosclerosis | circulatory system | #D14285 |
| 425.00 | Cardiomyopathy | circulatory system | #D14285 |
| 426.91 | Cardiac pacemaker in situ | circulatory system | #D14285 |
| 425.10 | Primary/intrinsic cardiomyopathies | circulatory system | #D14285 |
Calculating the subgraph structure for downstream visualization
calculate_subgraph_structure() to calculate subgraph structure for downstream visualization. The subgraph structure is the set of subgraphs that constructed at all strength values, and the associations were sorted in descending order of strength.
#calculate subgraph structure
subgraphs <- phecode_pairs %>%
calculate_subgraph_structure()
#overview of the subgraph data
subgraphs %>%
dplyr::select(-subgraphs) %>%
head() %>%
knitr::kable()| step | n_edges | strength | n_nodes_seen | n_subgraphs | max_size | rel_max_size | avg_size | avg_density | n_triples |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 131.0360 | 2 | 1 | 2 | 1.0000000 | 2.000000 | 1.0000000 | 0 |
| 2 | 2 | 129.1788 | 4 | 2 | 2 | 0.5000000 | 2.000000 | 1.0000000 | 0 |
| 3 | 3 | 127.6532 | 6 | 3 | 2 | 0.3333333 | 2.000000 | 1.0000000 | 0 |
| 4 | 4 | 125.0173 | 7 | 3 | 3 | 0.4285714 | 2.333333 | 0.8888889 | 1 |
| 5 | 5 | 122.8753 | 9 | 4 | 3 | 0.3333333 | 2.250000 | 0.9166667 | 1 |
| 6 | 6 | 122.6985 | 10 | 4 | 3 | 0.3000000 | 2.500000 | 0.8333333 | 2 |
Prepare data for downstream visualization
id column in annotation data to Phecode description as well. When clicking the subgraph, the annotation table will show.
#convert Phecode to Phecode description
phecode_pairs = phecode_pairs %>%
rename(phecode=a) %>%
left_join(.,phecode_def[,c("phecode","description")],by="phecode") %>%
rename(a=description) %>%
dplyr::select(-phecode) %>%
rename(phecode=b) %>%
left_join(.,phecode_def[,c("phecode","description")],by="phecode") %>%
rename(b=description) %>%
dplyr::select(-phecode)
#overview of the updated phecode pairs data
phecode_pairs %>%
head() %>%
knitr::kable()| strength | a | b |
|---|---|---|
| 131.0360 | Mood disorders | Anxiety disorders |
| 129.1788 | Eustachian tube disorders | Hearing loss |
| 127.6532 | Neoplasm of uncertain behavior of skin | Benign neoplasm of skin |
| 125.0173 | Neoplasm of uncertain behavior of skin | Degenerative skin conditions and other dermatoses |
| 122.8753 | Early or threatened labor; hemorrhage in early pregnancy | Known or suspected fetal abnormality affecting management of mother |
| 122.6985 | Otitis media and Eustachian tube disorders | Hearing loss |
#update annotation data as well
annotate_node = annotate_node %>%
dplyr::select(-id) %>%
rename(id=description)Final visualization
#visualize
visualize_subgraph_structure(
phecode_pairs,
node_info = annotate_node,
subgraph_results = subgraphs,
trim_subgraph_results = TRUE
)Highlighting a node of interest
Calculus of kidney, simply supply the id of "Calculus of kidney" to the visualize_subgraph_structure() function and you will be automatically taken to where Calculus of kidney first gets grouped into a subgraph.
#visualize
visualize_subgraph_structure(
phecode_pairs,
node_info = annotate_node,
subgraph_results = subgraphs,
trim_subgraph_results = TRUE,
pinned_node = "Calculus of kidney"
)visualize_subgraph_structure creates an R htmlwidget to host the visualization using r2d3, which means you can directly include your codes into a .Rmd file and then generate publishable web content by html file.